Skip to content

Conversation

@Aminsed
Copy link
Contributor

@Aminsed Aminsed commented Oct 13, 2025

Add documentation explaining temp_storage_bytes query pattern.

  • Clarify two-phase usage pattern
  • Document what arguments are required vs optional
  • Explain that pointers can be nullptr during query
  • Add example showing proper usage

Fixes #847

@Aminsed Aminsed requested a review from a team as a code owner October 13, 2025 15:18
@github-project-automation github-project-automation bot moved this to Todo in CCCL Oct 13, 2025
@Aminsed Aminsed requested a review from gonidelis October 13, 2025 15:18
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 13, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Oct 13, 2025
../api/device


Determining Temporary Storage Requirements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to also mention the single-phase API

@Aminsed Aminsed force-pushed the doc/temp_storage_bytes_guide branch from 1dc3643 to 11abc08 Compare October 13, 2025 20:09
@Aminsed Aminsed requested a review from fbusato October 13, 2025 20:10
Copy link
Member

@gonidelis gonidelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single phase APIs open up a whole new space of explanation for the documentation - the two phase clarifications though are much needed. Thanks a lot for taking the time to provide docs for it.

@github-project-automation github-project-automation bot moved this from In Review to In Progress in CCCL Oct 14, 2025
@Aminsed Aminsed requested a review from gonidelis October 26, 2025 19:55
@Aminsed
Copy link
Contributor Author

Aminsed commented Nov 1, 2025

@gonidelis

Comment on lines 27 to 28
* **Can be nullptr/uninitialized**: All input/output pointers (``d_in``, ``d_out``, etc.)
* **Note**: The algorithm does not access input data during the query phase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we actually providing this guarantee? Can you point at a place from which you derive this fact?

Hypothetical scenario: we could determine the temporary storage based on the alignment of another input pointer. AFAIK we don't do that, but currently, we could.

However, since we seem to be vage about what's required on all parameters that are not taking part in the query phase, maybe we should just define what's being suggested here. But that requires some broader approval and probably a review of all existing APIs.

@gevtushenko what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bernhardmgruber Thanks for flagging this great point. I re-audited the device-wide dispatch layer to make sure we're not overpromising. Every dispatcher we ship

(dispatch_reduce*.cuh, dispatch_scan*.cuh, dispatch_select_if.cuh,
dispatch_histogram.cuh, dispatch_radix_sort.cuh, dispatch_merge*.cuh,
dispatch_rle.cuh, dispatch_unique_by_key.cuh, dispatch_three_way_partition.cuh,
dispatch_topk.cuh, dispatch_adjacent_difference.cuh, dispatch_batch_memcpy.cuh)

exits immediately when d_temp_storage == nullptr- no kernels launch and no user pointers are dereferenced. I've updated the “What arguments are needed during the query phase?” bullets to call that out explicitly and list the audited dispatchers. Please let me know if you’d like me to add anything else or tighten the wording further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should list the specific implementations, but rather provide a general guarantee.

@gevtushenko can we agree that any arguments, except for the temporary storage pointer and size reference, are not inspected during a size query call of a CUB device API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Documentation (& code) regarding determining temp_storage_bytes not very clear

4 participants